Overview

Dataset statistics

Number of variables14
Number of observations10865
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory112.0 B

Variable types

Numeric10
Categorical4

Alerts

original_title has a high cardinality: 10571 distinct values High cardinality
director has a high cardinality: 5068 distinct values High cardinality
genres has a high cardinality: 2040 distinct values High cardinality
release_date has a high cardinality: 5909 distinct values High cardinality
df_index is highly correlated with release_yearHigh correlation
popularity is highly correlated with revenue and 1 other fieldsHigh correlation
budget is highly correlated with revenue and 3 other fieldsHigh correlation
revenue is highly correlated with popularity and 4 other fieldsHigh correlation
vote_count is highly correlated with popularity and 4 other fieldsHigh correlation
release_year is highly correlated with df_indexHigh correlation
budget_adj is highly correlated with budget and 3 other fieldsHigh correlation
revenue_adj is highly correlated with budget and 3 other fieldsHigh correlation
df_index is uniformly distributed Uniform
original_title is uniformly distributed Uniform
df_index has unique values Unique

Reproduction

Analysis started2022-10-07 23:45:50.344572
Analysis finished2022-10-07 23:47:21.560679
Duration1 minute and 31.22 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct10865
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5432.807639
Minimum0
Maximum10865
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:21.749879image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile543.2
Q12717
median5433
Q38149
95-th percentile10321.8
Maximum10865
Range10865
Interquartile range (IQR)5432

Descriptive statistics

Standard deviation3136.868785
Coefficient of variation (CV)0.5773936781
Kurtosis-1.199908013
Mean5432.807639
Median Absolute Deviation (MAD)2716
Skewness-0.0001828882776
Sum59027455
Variance9839945.777
MonotonicityStrictly increasing
2022-10-07T16:47:22.051113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
33631
 
< 0.1%
74731
 
< 0.1%
54241
 
< 0.1%
95181
 
< 0.1%
33711
 
< 0.1%
13221
 
< 0.1%
74651
 
< 0.1%
54161
 
< 0.1%
95101
 
< 0.1%
Other values (10855)10855
99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
108651
< 0.1%
108641
< 0.1%
108631
< 0.1%
108621
< 0.1%
108611
< 0.1%
108601
< 0.1%
108591
< 0.1%
108581
< 0.1%
108571
< 0.1%
108561
< 0.1%

original_title
Categorical

HIGH CARDINALITY
UNIFORM

Distinct10571
Distinct (%)97.3%
Missing0
Missing (%)0.0%
Memory size85.0 KiB
Hamlet
 
4
A Christmas Carol
 
3
Shelter
 
3
Carrie
 
3
Hercules
 
3
Other values (10566)
10849 

Length

Max length104
Median length70
Mean length16.00312931
Min length1

Characters and Unicode

Total characters173874
Distinct characters164
Distinct categories19 ?
Distinct scripts2 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10295 ?
Unique (%)94.8%

Sample

1st rowJurassic World
2nd rowMad Max: Fury Road
3rd rowInsurgent
4th rowStar Wars: The Force Awakens
5th rowFurious 7

Common Values

ValueCountFrequency (%)
Hamlet4
 
< 0.1%
A Christmas Carol3
 
< 0.1%
Shelter3
 
< 0.1%
Carrie3
 
< 0.1%
Hercules3
 
< 0.1%
Frankenstein3
 
< 0.1%
Life3
 
< 0.1%
Jane Eyre3
 
< 0.1%
Julia3
 
< 0.1%
Emma3
 
< 0.1%
Other values (10561)10834
99.7%

Length

2022-10-07T16:47:22.273664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the3279
 
10.6%
of969
 
3.1%
a386
 
1.2%
in327
 
1.1%
and317
 
1.0%
to227
 
0.7%
2226
 
0.7%
211
 
0.7%
man148
 
0.5%
for113
 
0.4%
Other values (8859)24842
80.0%

Most occurring characters

ValueCountFrequency (%)
20178
 
11.6%
e17617
 
10.1%
a10758
 
6.2%
o10239
 
5.9%
r9356
 
5.4%
n9343
 
5.4%
i9062
 
5.2%
t8414
 
4.8%
s6857
 
3.9%
h6463
 
3.7%
Other values (154)65587
37.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter122010
70.2%
Uppercase Letter27178
 
15.6%
Space Separator20195
 
11.6%
Other Punctuation2612
 
1.5%
Decimal Number1140
 
0.7%
Dash Punctuation212
 
0.1%
Modifier Symbol114
 
0.1%
Other Symbol101
 
0.1%
Currency Symbol78
 
< 0.1%
Other Number71
 
< 0.1%
Other values (9)163
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e17617
14.4%
a10758
 
8.8%
o10239
 
8.4%
r9356
 
7.7%
n9343
 
7.7%
i9062
 
7.4%
t8414
 
6.9%
s6857
 
5.6%
h6463
 
5.3%
l5845
 
4.8%
Other values (35)28056
23.0%
Uppercase Letter
ValueCountFrequency (%)
T3615
 
13.3%
S2228
 
8.2%
B1815
 
6.7%
M1783
 
6.6%
C1613
 
5.9%
D1579
 
5.8%
A1541
 
5.7%
L1281
 
4.7%
H1244
 
4.6%
P1197
 
4.4%
Other values (27)9282
34.2%
Other Punctuation
ValueCountFrequency (%)
:1097
42.0%
'560
21.4%
.368
 
14.1%
,154
 
5.9%
&153
 
5.9%
!123
 
4.7%
?41
 
1.6%
/29
 
1.1%
¡14
 
0.5%
12
 
0.5%
Other values (14)61
 
2.3%
Decimal Number
ValueCountFrequency (%)
2327
28.7%
1172
15.1%
3172
15.1%
0167
14.6%
481
 
7.1%
567
 
5.9%
948
 
4.2%
742
 
3.7%
635
 
3.1%
829
 
2.5%
Currency Symbol
ValueCountFrequency (%)
29
37.2%
¤18
23.1%
¢12
15.4%
£10
 
12.8%
¥5
 
6.4%
$4
 
5.1%
Other Number
ValueCountFrequency (%)
¹22
31.0%
³14
19.7%
¼14
19.7%
½9
12.7%
²7
 
9.9%
¾5
 
7.0%
Modifier Symbol
ValueCountFrequency (%)
¸63
55.3%
¨24
 
21.1%
´14
 
12.3%
˜9
 
7.9%
¯4
 
3.5%
Other Symbol
ValueCountFrequency (%)
©60
59.4%
20
 
19.8%
°11
 
10.9%
¦7
 
6.9%
®3
 
3.0%
Math Symbol
ValueCountFrequency (%)
±9
36.0%
¬8
32.0%
×5
20.0%
+3
 
12.0%
Final Punctuation
ValueCountFrequency (%)
»8
38.1%
7
33.3%
3
 
14.3%
3
 
14.3%
Initial Punctuation
ValueCountFrequency (%)
7
29.2%
7
29.2%
5
20.8%
«5
20.8%
Dash Punctuation
ValueCountFrequency (%)
-206
97.2%
5
 
2.4%
1
 
0.5%
Open Punctuation
ValueCountFrequency (%)
(16
43.2%
14
37.8%
7
18.9%
Space Separator
ValueCountFrequency (%)
20178
99.9%
 17
 
0.1%
Other Letter
ValueCountFrequency (%)
ª13
61.9%
º8
38.1%
Close Punctuation
ValueCountFrequency (%)
)16
100.0%
Modifier Letter
ValueCountFrequency (%)
ˆ11
100.0%
Format
ValueCountFrequency (%)
­6
100.0%
Connector Punctuation
ValueCountFrequency (%)
_2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin149204
85.8%
Common24670
 
14.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e17617
 
11.8%
a10758
 
7.2%
o10239
 
6.9%
r9356
 
6.3%
n9343
 
6.3%
i9062
 
6.1%
t8414
 
5.6%
s6857
 
4.6%
h6463
 
4.3%
l5845
 
3.9%
Other values (73)55250
37.0%
Common
ValueCountFrequency (%)
20178
81.8%
:1097
 
4.4%
'560
 
2.3%
.368
 
1.5%
2327
 
1.3%
-206
 
0.8%
1172
 
0.7%
3172
 
0.7%
0167
 
0.7%
,154
 
0.6%
Other values (71)1269
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII172791
99.4%
None915
 
0.5%
Punctuation99
 
0.1%
Currency Symbols29
 
< 0.1%
Letterlike Symbols20
 
< 0.1%
Modifier Letters20
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20178
 
11.7%
e17617
 
10.2%
a10758
 
6.2%
o10239
 
5.9%
r9356
 
5.4%
n9343
 
5.4%
i9062
 
5.2%
t8414
 
4.9%
s6857
 
4.0%
h6463
 
3.7%
Other values (73)64504
37.3%
None
ValueCountFrequency (%)
Ã162
 
17.7%
¸63
 
6.9%
©60
 
6.6%
à50
 
5.5%
ì31
 
3.4%
¨24
 
2.6%
¹22
 
2.4%
å21
 
2.3%
¤18
 
2.0%
ã17
 
1.9%
Other values (52)447
48.9%
Currency Symbols
ValueCountFrequency (%)
29
100.0%
Letterlike Symbols
ValueCountFrequency (%)
20
100.0%
Punctuation
ValueCountFrequency (%)
14
14.1%
12
12.1%
11
11.1%
8
8.1%
7
7.1%
7
7.1%
7
7.1%
7
7.1%
7
7.1%
5
 
5.1%
Other values (5)14
14.1%
Modifier Letters
ValueCountFrequency (%)
ˆ11
55.0%
˜9
45.0%

popularity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10814
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6464455549
Minimum6.5 × 10-5
Maximum32.985763
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:22.467868image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum6.5 × 10-5
5-th percentile0.0642494
Q10.207575
median0.383831
Q30.713857
95-th percentile2.0466582
Maximum32.985763
Range32.985698
Interquartile range (IQR)0.506282

Descriptive statistics

Standard deviation1.00023085
Coefficient of variation (CV)1.547277791
Kurtosis210.9783633
Mean0.6464455549
Median Absolute Deviation (MAD)0.215396
Skewness9.875866523
Sum7023.630954
Variance1.000461754
MonotonicityNot monotonic
2022-10-07T16:47:22.654484image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.1093052
 
< 0.1%
0.1140272
 
< 0.1%
0.4685522
 
< 0.1%
0.1500352
 
< 0.1%
0.1873192
 
< 0.1%
0.1869952
 
< 0.1%
0.2651192
 
< 0.1%
0.0117982
 
< 0.1%
0.1388612
 
< 0.1%
0.0784822
 
< 0.1%
Other values (10804)10845
99.8%
ValueCountFrequency (%)
6.5 × 10-51
< 0.1%
0.0001881
< 0.1%
0.000621
< 0.1%
0.0009731
< 0.1%
0.0011151
< 0.1%
0.0011171
< 0.1%
0.0013151
< 0.1%
0.0013171
< 0.1%
0.0013491
< 0.1%
0.0013721
< 0.1%
ValueCountFrequency (%)
32.9857631
< 0.1%
28.4199361
< 0.1%
24.9491341
< 0.1%
14.3112051
< 0.1%
13.1125071
< 0.1%
12.9710271
< 0.1%
12.0379331
< 0.1%
11.4227511
< 0.1%
11.1731041
< 0.1%
10.7390091
< 0.1%

budget
Real number (ℝ≥0)

HIGH CORRELATION

Distinct557
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22291100
Minimum1
Maximum425000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:22.856856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1700000
Q114624286.06
median14624286.06
Q315000000
95-th percentile75000000
Maximum425000000
Range424999999
Interquartile range (IQR)375713.9357

Descriptive statistics

Standard deviation28013845.61
Coefficient of variation (CV)1.256727824
Kurtosis24.90082452
Mean22291100
Median Absolute Deviation (MAD)0
Skewness4.228529025
Sum2.421928015 × 1011
Variance7.847755458 × 1014
MonotonicityNot monotonic
2022-10-07T16:47:23.044563image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14624286.065696
52.4%
20000000190
 
1.7%
15000000183
 
1.7%
25000000178
 
1.6%
10000000176
 
1.6%
30000000164
 
1.5%
5000000141
 
1.3%
40000000134
 
1.2%
35000000128
 
1.2%
12000000120
 
1.1%
Other values (547)3755
34.6%
ValueCountFrequency (%)
14
< 0.1%
21
 
< 0.1%
33
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
83
< 0.1%
106
0.1%
111
 
< 0.1%
122
 
< 0.1%
141
 
< 0.1%
ValueCountFrequency (%)
4250000001
 
< 0.1%
3800000001
 
< 0.1%
3000000001
 
< 0.1%
2800000001
 
< 0.1%
2700000001
 
< 0.1%
2600000002
 
< 0.1%
2580000001
 
< 0.1%
2550000001
 
< 0.1%
2500000007
0.1%
2450000001
 
< 0.1%

revenue
Real number (ℝ≥0)

HIGH CORRELATION

Distinct4702
Distinct (%)43.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61879229.88
Minimum2
Maximum2781505847
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:23.216532image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile1069466
Q139826896.08
median39826896.08
Q339826896.08
95-th percentile213723484
Maximum2781505847
Range2781505845
Interquartile range (IQR)0

Descriptive statistics

Standard deviation111023555.9
Coefficient of variation (CV)1.79419744
Kurtosis84.97811557
Mean61879229.88
Median Absolute Deviation (MAD)0
Skewness7.215638107
Sum6.723178327 × 1011
Variance1.232622995 × 1016
MonotonicityNot monotonic
2022-10-07T16:47:23.388243image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39826896.086016
55.4%
1200000010
 
0.1%
100000008
 
0.1%
110000007
 
0.1%
50000006
 
0.1%
20000006
 
0.1%
60000006
 
0.1%
200000005
 
< 0.1%
300000005
 
< 0.1%
130000005
 
< 0.1%
Other values (4692)4791
44.1%
ValueCountFrequency (%)
22
< 0.1%
33
< 0.1%
52
< 0.1%
62
< 0.1%
92
< 0.1%
101
 
< 0.1%
113
< 0.1%
121
 
< 0.1%
132
< 0.1%
153
< 0.1%
ValueCountFrequency (%)
27815058471
< 0.1%
20681782251
< 0.1%
18450341881
< 0.1%
15195579101
< 0.1%
15135288101
< 0.1%
15062493601
< 0.1%
14050357671
< 0.1%
13278178221
< 0.1%
12742190091
< 0.1%
12154399941
< 0.1%

director
Categorical

HIGH CARDINALITY

Distinct5068
Distinct (%)46.6%
Missing0
Missing (%)0.0%
Memory size85.0 KiB
Woody Allen
 
45
other
 
44
Clint Eastwood
 
34
Martin Scorsese
 
29
Steven Spielberg
 
29
Other values (5063)
10684 

Length

Max length533
Median length169
Mean length14.5192821
Min length2

Characters and Unicode

Total characters157752
Distinct characters96
Distinct categories18 ?
Distinct scripts2 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3217 ?
Unique (%)29.6%

Sample

1st rowColin Trevorrow
2nd rowGeorge Miller
3rd rowRobert Schwentke
4th rowJ.J. Abrams
5th rowJames Wan

Common Values

ValueCountFrequency (%)
Woody Allen45
 
0.4%
other44
 
0.4%
Clint Eastwood34
 
0.3%
Martin Scorsese29
 
0.3%
Steven Spielberg29
 
0.3%
Ridley Scott23
 
0.2%
Ron Howard22
 
0.2%
Steven Soderbergh22
 
0.2%
Joel Schumacher21
 
0.2%
Brian De Palma20
 
0.2%
Other values (5058)10576
97.3%

Length

2022-10-07T16:47:23.589669image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john436
 
1.8%
michael308
 
1.3%
david301
 
1.3%
robert212
 
0.9%
peter201
 
0.8%
james162
 
0.7%
richard159
 
0.7%
paul144
 
0.6%
mark110
 
0.5%
lee107
 
0.4%
Other values (6203)21641
91.0%

Most occurring characters

ValueCountFrequency (%)
e14674
 
9.3%
12932
 
8.2%
a12669
 
8.0%
n11046
 
7.0%
r10732
 
6.8%
o9204
 
5.8%
i9106
 
5.8%
l7510
 
4.8%
t5516
 
3.5%
s5207
 
3.3%
Other values (86)59156
37.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter116745
74.0%
Uppercase Letter25715
 
16.3%
Space Separator12934
 
8.2%
Math Symbol1088
 
0.7%
Other Punctuation838
 
0.5%
Dash Punctuation178
 
0.1%
Other Symbol118
 
0.1%
Format35
 
< 0.1%
Other Number32
 
< 0.1%
Modifier Symbol28
 
< 0.1%
Other values (8)41
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S2357
 
9.2%
J2208
 
8.6%
M2207
 
8.6%
R1800
 
7.0%
B1686
 
6.6%
C1612
 
6.3%
D1483
 
5.8%
A1433
 
5.6%
G1264
 
4.9%
L1222
 
4.8%
Other values (20)8443
32.8%
Lowercase Letter
ValueCountFrequency (%)
e14674
12.6%
a12669
10.9%
n11046
9.5%
r10732
9.2%
o9204
 
7.9%
i9106
 
7.8%
l7510
 
6.4%
t5516
 
4.7%
s5207
 
4.5%
h4588
 
3.9%
Other values (16)26493
22.7%
Other Punctuation
ValueCountFrequency (%)
.644
76.8%
¡65
 
7.8%
'52
 
6.2%
30
 
3.6%
,18
 
2.1%
§15
 
1.8%
5
 
0.6%
5
 
0.6%
4
 
0.5%
Modifier Symbol
ValueCountFrequency (%)
¨12
42.9%
´9
32.1%
¸5
17.9%
¯2
 
7.1%
Math Symbol
ValueCountFrequency (%)
|1070
98.3%
±17
 
1.6%
¬1
 
0.1%
Other Symbol
ValueCountFrequency (%)
©112
94.9%
¦5
 
4.2%
1
 
0.8%
Currency Symbol
ValueCountFrequency (%)
¥11
57.9%
¤7
36.8%
1
 
5.3%
Space Separator
ValueCountFrequency (%)
12932
> 99.9%
 2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-177
99.4%
1
 
0.6%
Other Number
ValueCountFrequency (%)
³27
84.4%
¼5
 
15.6%
Open Punctuation
ValueCountFrequency (%)
4
80.0%
(1
 
20.0%
Final Punctuation
ValueCountFrequency (%)
»3
75.0%
1
 
25.0%
Initial Punctuation
ValueCountFrequency (%)
«3
75.0%
1
 
25.0%
Other Letter
ValueCountFrequency (%)
º3
75.0%
ª1
 
25.0%
Format
ValueCountFrequency (%)
­35
100.0%
Control
ValueCountFrequency (%)
3
100.0%
Decimal Number
ValueCountFrequency (%)
91
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin142464
90.3%
Common15288
 
9.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e14674
 
10.3%
a12669
 
8.9%
n11046
 
7.8%
r10732
 
7.5%
o9204
 
6.5%
i9106
 
6.4%
l7510
 
5.3%
t5516
 
3.9%
s5207
 
3.7%
h4588
 
3.2%
Other values (48)52212
36.6%
Common
ValueCountFrequency (%)
12932
84.6%
|1070
 
7.0%
.644
 
4.2%
-177
 
1.2%
©112
 
0.7%
¡65
 
0.4%
'52
 
0.3%
­35
 
0.2%
30
 
0.2%
³27
 
0.2%
Other values (28)144
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII156950
99.5%
None779
 
0.5%
Punctuation21
 
< 0.1%
Letterlike Symbols1
 
< 0.1%
Currency Symbols1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e14674
 
9.3%
12932
 
8.2%
a12669
 
8.1%
n11046
 
7.0%
r10732
 
6.8%
o9204
 
5.9%
i9106
 
5.8%
l7510
 
4.8%
t5516
 
3.5%
s5207
 
3.3%
Other values (52)58354
37.2%
None
ValueCountFrequency (%)
Ã377
48.4%
©112
 
14.4%
¡65
 
8.3%
­35
 
4.5%
30
 
3.9%
³27
 
3.5%
±17
 
2.2%
Å15
 
1.9%
§15
 
1.9%
Ä14
 
1.8%
Other values (15)72
 
9.2%
Punctuation
ValueCountFrequency (%)
5
23.8%
5
23.8%
4
19.0%
4
19.0%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Letterlike Symbols
ValueCountFrequency (%)
1
100.0%
Currency Symbols
ValueCountFrequency (%)
1
100.0%

runtime
Real number (ℝ≥0)

Distinct247
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean102.0717902
Minimum0
Maximum900
Zeros31
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:23.982120image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile75
Q190
median99
Q3111
95-th percentile139
Maximum900
Range900
Interquartile range (IQR)21

Descriptive statistics

Standard deviation31.38270058
Coefficient of variation (CV)0.3074571391
Kurtosis116.2281369
Mean102.0717902
Median Absolute Deviation (MAD)10
Skewness6.103513226
Sum1109010
Variance984.8738958
MonotonicityNot monotonic
2022-10-07T16:47:24.467249image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90547
 
5.0%
95358
 
3.3%
100335
 
3.1%
93328
 
3.0%
97306
 
2.8%
96300
 
2.8%
91297
 
2.7%
94292
 
2.7%
98270
 
2.5%
88270
 
2.5%
Other values (237)7562
69.6%
ValueCountFrequency (%)
031
0.3%
25
 
< 0.1%
311
 
0.1%
417
0.2%
517
0.2%
622
0.2%
717
0.2%
89
 
0.1%
97
 
0.1%
106
 
0.1%
ValueCountFrequency (%)
9001
< 0.1%
8771
< 0.1%
7051
< 0.1%
5661
< 0.1%
5611
< 0.1%
5501
< 0.1%
5401
< 0.1%
5011
< 0.1%
5001
< 0.1%
4701
< 0.1%

genres
Categorical

HIGH CARDINALITY

Distinct2040
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Memory size85.0 KiB
Comedy
 
712
Drama
 
712
Documentary
 
312
Drama|Romance
 
289
Comedy|Drama
 
280
Other values (2035)
8560 

Length

Max length51
Median length44
Mean length18.50621261
Min length3

Characters and Unicode

Total characters201070
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1226 ?
Unique (%)11.3%

Sample

1st rowAction|Adventure|Science Fiction|Thriller
2nd rowAction|Adventure|Science Fiction|Thriller
3rd rowAdventure|Science Fiction|Thriller
4th rowAction|Adventure|Science Fiction|Fantasy
5th rowAction|Crime|Thriller

Common Values

ValueCountFrequency (%)
Comedy712
 
6.6%
Drama712
 
6.6%
Documentary312
 
2.9%
Drama|Romance289
 
2.7%
Comedy|Drama280
 
2.6%
Comedy|Romance268
 
2.5%
Horror|Thriller259
 
2.4%
Horror253
 
2.3%
Comedy|Drama|Romance222
 
2.0%
Drama|Thriller138
 
1.3%
Other values (2030)7420
68.3%

Length

2022-10-07T16:47:24.672607image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
comedy712
 
5.8%
drama712
 
5.8%
fiction669
 
5.5%
documentary312
 
2.5%
drama|romance289
 
2.4%
comedy|drama280
 
2.3%
comedy|romance268
 
2.2%
horror|thriller259
 
2.1%
horror253
 
2.1%
comedy|drama|romance222
 
1.8%
Other values (1901)8285
67.6%

Most occurring characters

ValueCountFrequency (%)
r20620
 
10.3%
e17204
 
8.6%
|16113
 
8.0%
a15784
 
7.9%
o14323
 
7.1%
m14069
 
7.0%
i14058
 
7.0%
n11212
 
5.6%
c8711
 
4.3%
t8551
 
4.3%
Other values (20)60425
30.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter155043
77.1%
Uppercase Letter28518
 
14.2%
Math Symbol16113
 
8.0%
Space Separator1396
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r20620
13.3%
e17204
11.1%
a15784
10.2%
o14323
9.2%
m14069
9.1%
i14058
9.1%
n11212
7.2%
c8711
 
5.6%
t8551
 
5.5%
y8414
 
5.4%
Other values (7)22097
14.3%
Uppercase Letter
ValueCountFrequency (%)
D5280
18.5%
C5147
18.0%
A4554
16.0%
F3564
12.5%
T3074
10.8%
H1971
 
6.9%
R1712
 
6.0%
M1385
 
4.9%
S1229
 
4.3%
W435
 
1.5%
Math Symbol
ValueCountFrequency (%)
|16113
100.0%
Space Separator
ValueCountFrequency (%)
1396
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin183561
91.3%
Common17509
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r20620
11.2%
e17204
 
9.4%
a15784
 
8.6%
o14323
 
7.8%
m14069
 
7.7%
i14058
 
7.7%
n11212
 
6.1%
c8711
 
4.7%
t8551
 
4.7%
y8414
 
4.6%
Other values (18)50615
27.6%
Common
ValueCountFrequency (%)
|16113
92.0%
1396
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII201070
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r20620
 
10.3%
e17204
 
8.6%
|16113
 
8.0%
a15784
 
7.9%
o14323
 
7.1%
m14069
 
7.0%
i14058
 
7.0%
n11212
 
5.6%
c8711
 
4.3%
t8551
 
4.3%
Other values (20)60425
30.1%

release_date
Categorical

HIGH CARDINALITY

Distinct5909
Distinct (%)54.4%
Missing0
Missing (%)0.0%
Memory size85.0 KiB
1/1/09
 
28
1/1/08
 
21
1/1/07
 
18
1/1/05
 
16
10/10/14
 
15
Other values (5904)
10767 

Length

Max length8
Median length7
Mean length6.960055223
Min length6

Characters and Unicode

Total characters75621
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3386 ?
Unique (%)31.2%

Sample

1st row6/9/15
2nd row5/13/15
3rd row3/18/15
4th row12/15/15
5th row4/1/15

Common Values

ValueCountFrequency (%)
1/1/0928
 
0.3%
1/1/0821
 
0.2%
1/1/0718
 
0.2%
1/1/0516
 
0.1%
10/10/1415
 
0.1%
1/1/0613
 
0.1%
9/7/1213
 
0.1%
1/1/0313
 
0.1%
1/1/1212
 
0.1%
10/16/1512
 
0.1%
Other values (5899)10704
98.5%

Length

2022-10-07T16:47:24.872963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1/1/0928
 
0.3%
1/1/0821
 
0.2%
1/1/0718
 
0.2%
1/1/0516
 
0.1%
10/10/1415
 
0.1%
1/1/0613
 
0.1%
9/7/1213
 
0.1%
1/1/0313
 
0.1%
10/16/1512
 
0.1%
10/14/1112
 
0.1%
Other values (5899)10704
98.5%

Most occurring characters

ValueCountFrequency (%)
/21730
28.7%
114799
19.6%
27071
 
9.4%
06708
 
8.9%
95065
 
6.7%
83932
 
5.2%
33527
 
4.7%
53295
 
4.4%
73259
 
4.3%
43219
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number53891
71.3%
Other Punctuation21730
28.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
114799
27.5%
27071
13.1%
06708
12.4%
95065
 
9.4%
83932
 
7.3%
33527
 
6.5%
53295
 
6.1%
73259
 
6.0%
43219
 
6.0%
63016
 
5.6%
Other Punctuation
ValueCountFrequency (%)
/21730
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common75621
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
/21730
28.7%
114799
19.6%
27071
 
9.4%
06708
 
8.9%
95065
 
6.7%
83932
 
5.2%
33527
 
4.7%
53295
 
4.4%
73259
 
4.3%
43219
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII75621
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/21730
28.7%
114799
19.6%
27071
 
9.4%
06708
 
8.9%
95065
 
6.7%
83932
 
5.2%
33527
 
4.7%
53295
 
4.4%
73259
 
4.3%
43219
 
4.3%

vote_count
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1289
Distinct (%)11.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean217.3996318
Minimum10
Maximum9767
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:25.044606image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile11
Q117
median38
Q3146
95-th percentile1026
Maximum9767
Range9757
Interquartile range (IQR)129

Descriptive statistics

Standard deviation575.644627
Coefficient of variation (CV)2.647863854
Kurtosis53.3557307
Mean217.3996318
Median Absolute Deviation (MAD)26
Skewness6.177000343
Sum2362047
Variance331366.7366
MonotonicityNot monotonic
2022-10-07T16:47:25.232165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10501
 
4.6%
11474
 
4.4%
12422
 
3.9%
13377
 
3.5%
14323
 
3.0%
15300
 
2.8%
16270
 
2.5%
17256
 
2.4%
18218
 
2.0%
19189
 
1.7%
Other values (1279)7535
69.4%
ValueCountFrequency (%)
10501
4.6%
11474
4.4%
12422
3.9%
13377
3.5%
14323
3.0%
15300
2.8%
16270
2.5%
17256
2.4%
18218
2.0%
19189
 
1.7%
ValueCountFrequency (%)
97671
< 0.1%
89031
< 0.1%
84581
< 0.1%
84321
< 0.1%
73751
< 0.1%
70801
< 0.1%
68821
< 0.1%
67231
< 0.1%
64981
< 0.1%
64171
< 0.1%

vote_average
Real number (ℝ≥0)

Distinct72
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.975011505
Minimum1.5
Maximum9.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:25.420047image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.5
5-th percentile4.4
Q15.4
median6
Q36.6
95-th percentile7.4
Maximum9.2
Range7.7
Interquartile range (IQR)1.2

Descriptive statistics

Standard deviation0.9351380715
Coefficient of variation (CV)0.1565081625
Kurtosis0.5439449721
Mean5.975011505
Median Absolute Deviation (MAD)0.6
Skewness-0.4361369375
Sum64918.5
Variance0.8744832128
MonotonicityNot monotonic
2022-10-07T16:47:25.592291image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.1496
 
4.6%
6495
 
4.6%
5.8486
 
4.5%
5.9473
 
4.4%
6.2464
 
4.3%
6.3461
 
4.2%
6.5457
 
4.2%
6.4446
 
4.1%
5.7415
 
3.8%
6.6413
 
3.8%
Other values (62)6259
57.6%
ValueCountFrequency (%)
1.52
 
< 0.1%
21
 
< 0.1%
2.13
< 0.1%
2.23
< 0.1%
2.32
 
< 0.1%
2.47
0.1%
2.52
 
< 0.1%
2.63
< 0.1%
2.73
< 0.1%
2.87
0.1%
ValueCountFrequency (%)
9.21
 
< 0.1%
8.91
 
< 0.1%
8.82
 
< 0.1%
8.71
 
< 0.1%
8.61
 
< 0.1%
8.56
 
0.1%
8.410
0.1%
8.310
0.1%
8.26
 
0.1%
8.116
0.1%

release_year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct56
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2001.321859
Minimum1960
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:25.796961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1960
5-th percentile1973
Q11995
median2006
Q32011
95-th percentile2015
Maximum2015
Range55
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.81325978
Coefficient of variation (CV)0.006402398355
Kurtosis0.799702805
Mean2001.321859
Median Absolute Deviation (MAD)7
Skewness-1.204116723
Sum21744362
Variance164.1796261
MonotonicityNot monotonic
2022-10-07T16:47:25.968184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2014700
 
6.4%
2013659
 
6.1%
2015629
 
5.8%
2012588
 
5.4%
2011540
 
5.0%
2009533
 
4.9%
2008496
 
4.6%
2010489
 
4.5%
2007438
 
4.0%
2006408
 
3.8%
Other values (46)5385
49.6%
ValueCountFrequency (%)
196032
0.3%
196131
0.3%
196232
0.3%
196334
0.3%
196442
0.4%
196535
0.3%
196646
0.4%
196740
0.4%
196839
0.4%
196931
0.3%
ValueCountFrequency (%)
2015629
5.8%
2014700
6.4%
2013659
6.1%
2012588
5.4%
2011540
5.0%
2010489
4.5%
2009533
4.9%
2008496
4.6%
2007438
4.0%
2006408
3.8%

budget_adj
Real number (ℝ≥0)

HIGH CORRELATION

Distinct2614
Distinct (%)24.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26750464.35
Minimum0.9210910508
Maximum425000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:26.156064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.9210910508
5-th percentile2346207.763
Q117549894.04
median17549894.04
Q320853251.08
95-th percentile89378476.25
Maximum425000000
Range424999999.1
Interquartile range (IQR)3303357.047

Descriptive statistics

Standard deviation30510067.24
Coefficient of variation (CV)1.140543463
Kurtosis17.6875005
Mean26750464.35
Median Absolute Deviation (MAD)0
Skewness3.596341207
Sum2.906437952 × 1011
Variance9.308642027 × 1014
MonotonicityNot monotonic
2022-10-07T16:47:26.342235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17549894.045696
52.4%
21033371.6517
 
0.2%
10164004.3417
 
0.2%
2000000016
 
0.1%
4605455.25415
 
0.1%
33496898.6914
 
0.1%
24234951.0614
 
0.1%
26291714.5713
 
0.1%
20328008.6813
 
0.1%
40656017.3613
 
0.1%
Other values (2604)5037
46.4%
ValueCountFrequency (%)
0.92109105081
< 0.1%
0.96939804261
< 0.1%
1.0127866341
< 0.1%
1.3090528471
< 0.1%
2.9081941281
< 0.1%
31
< 0.1%
4.5192848051
< 0.1%
4.6054552541
< 0.1%
5.0066956211
< 0.1%
8.102293071
< 0.1%
ValueCountFrequency (%)
4250000001
< 0.1%
368371256.21
< 0.1%
315500574.81
< 0.1%
292050672.71
< 0.1%
271692064.21
< 0.1%
271330494.31
< 0.1%
2600000001
< 0.1%
257599886.71
< 0.1%
254100108.51
< 0.1%
250419201.71
< 0.1%

revenue_adj
Real number (ℝ≥0)

HIGH CORRELATION

Distinct4840
Distinct (%)44.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79812252.07
Minimum2.37070529
Maximum2827123750
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.0 KiB
2022-10-07T16:47:26.513260image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2.37070529
5-th percentile1140131.371
Q151369001.76
median51369001.76
Q351369001.76
95-th percentile276582848.1
Maximum2827123750
Range2827123748
Interquartile range (IQR)0

Descriptive statistics

Standard deviation136564704.8
Coefficient of variation (CV)1.711074443
Kurtosis74.55950245
Mean79812252.07
Median Absolute Deviation (MAD)0
Skewness6.821910006
Sum8.671601187 × 1011
Variance1.864991859 × 1016
MonotonicityNot monotonic
2022-10-07T16:47:26.670075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
51369001.766016
55.4%
29106404.282
 
< 0.1%
26331569.652
 
< 0.1%
117753430.82
 
< 0.1%
317214592
 
< 0.1%
10000002
 
< 0.1%
89906740.122
 
< 0.1%
14389144.832
 
< 0.1%
57667591.032
 
< 0.1%
209354710.52
 
< 0.1%
Other values (4830)4831
44.5%
ValueCountFrequency (%)
2.370705291
< 0.1%
2.8619337341
< 0.1%
3.0383599011
< 0.1%
5.9267632241
< 0.1%
6.9510836951
< 0.1%
8.5858012031
< 0.1%
9.056819771
< 0.1%
9.1150797041
< 0.1%
101
< 0.1%
10.296366881
< 0.1%
ValueCountFrequency (%)
28271237501
< 0.1%
27897122421
< 0.1%
25064057351
< 0.1%
21673249011
< 0.1%
19070058421
< 0.1%
19027231301
< 0.1%
17916943091
< 0.1%
15830495361
< 0.1%
15748147401
< 0.1%
14431914351
< 0.1%

Interactions

2022-10-07T16:47:19.481727image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:45:53.420445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:49.381439image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:55.029513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:59.283749image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:05.955080image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:08.856317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:11.731623image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:14.214978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:16.261125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:19.638921image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:45:53.524199image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:49.830239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:55.823011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:59.761471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:06.131607image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:09.177457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:12.298108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:14.417435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:16.482802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:19.795300image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:45:53.967976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:50.338878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:56.141158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:00.261134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:06.298161image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:09.460699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:12.493585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:14.626877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:16.732473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:19.935650image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:45:54.105809image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:50.690937image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:56.424400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:00.970236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:06.447767image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:09.711030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:12.696043image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:14.813376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:16.951135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:20.092323image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:45:54.348104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:51.196586image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:56.720608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:02.001477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:06.655207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:09.983301image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:12.931415image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:15.014299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:17.169880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:20.232179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:41.247100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:51.896516image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:57.268146image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:03.047498image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:06.940443image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:10.477977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:13.135867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:15.216818image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:17.311169image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:20.388433image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:41.381932image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:52.590173image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:57.692009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:03.859686image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:07.383257image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:10.777177image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:13.349296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:15.419832image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:18.810470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:20.560209image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:42.258114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:53.079865image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:58.206631image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:04.745317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:08.235976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:11.029503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:13.579680image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:15.629979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:18.993713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:20.693515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:42.390114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:53.512705image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:58.610551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:05.196112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:08.443422image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:11.237946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:13.795105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:15.837157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:19.169394image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:20.847115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:48.568614image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:53.814895image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:46:58.989537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:05.483341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:08.654857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:11.432423image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:13.989581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:16.045407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-10-07T16:47:19.317345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-10-07T16:47:26.851841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-07T16:47:27.076060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-07T16:47:27.350152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-07T16:47:27.579556image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-07T16:47:21.138725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-07T16:47:21.435170image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexoriginal_titlepopularitybudgetrevenuedirectorruntimegenresrelease_datevote_countvote_averagerelease_yearbudget_adjrevenue_adj
00Jurassic World32.985763150000000.01.513529e+09Colin Trevorrow124Action|Adventure|Science Fiction|Thriller6/9/1555626.520151.379999e+081.392446e+09
11Mad Max: Fury Road28.419936150000000.03.784364e+08George Miller120Action|Adventure|Science Fiction|Thriller5/13/1561857.120151.379999e+083.481613e+08
22Insurgent13.112507110000000.02.952382e+08Robert Schwentke119Adventure|Science Fiction|Thriller3/18/1524806.320151.012000e+082.716190e+08
33Star Wars: The Force Awakens11.173104200000000.02.068178e+09J.J. Abrams136Action|Adventure|Science Fiction|Fantasy12/15/1552927.520151.839999e+081.902723e+09
44Furious 79.335014190000000.01.506249e+09James Wan137Action|Crime|Thriller4/1/1529477.320151.747999e+081.385749e+09
55The Revenant9.110700135000000.05.329505e+08Alejandro González Iñárritu156Western|Drama|Adventure|Thriller12/25/1539297.220151.241999e+084.903142e+08
66Terminator Genisys8.654359155000000.04.406035e+08Alan Taylor125Science Fiction|Action|Thriller|Adventure6/23/1525985.820151.425999e+084.053551e+08
77The Martian7.667400108000000.05.953803e+08Ridley Scott141Drama|Adventure|Science Fiction9/30/1545727.620159.935996e+075.477497e+08
88Minions7.40416574000000.01.156731e+09Kyle Balda|Pierre Coffin91Family|Animation|Adventure|Comedy6/17/1528936.520156.807997e+071.064192e+09
99Inside Out6.326804175000000.08.537086e+08Pete Docter94Comedy|Animation|Family6/9/1539358.020151.609999e+087.854116e+08

Last rows

df_indexoriginal_titlepopularitybudgetrevenuedirectorruntimegenresrelease_datevote_countvote_averagerelease_yearbudget_adjrevenue_adj
1085510856The Ugly Dachshund0.1409341.462429e+073.982690e+07Norman Tokar93Comedy|Drama|Family2/16/66145.719661.754989e+075.136900e+07
1085610857Nevada Smith0.1313781.462429e+073.982690e+07Henry Hathaway128Action|Western6/10/66105.919661.754989e+075.136900e+07
1085710858The Russians Are Coming, The Russians Are Coming0.3178241.462429e+073.982690e+07Norman Jewison126Comedy|War5/25/66115.519661.754989e+075.136900e+07
1085810859Seconds0.0890721.462429e+073.982690e+07John Frankenheimer100Mystery|Science Fiction|Thriller|Drama10/5/66226.619661.754989e+075.136900e+07
1085910860Carry On Screaming!0.0870341.462429e+073.982690e+07Gerald Thomas87Comedy5/20/66137.019661.754989e+075.136900e+07
1086010861The Endless Summer0.0805981.462429e+073.982690e+07Bruce Brown95Documentary6/15/66117.419661.754989e+075.136900e+07
1086110862Grand Prix0.0655431.462429e+073.982690e+07John Frankenheimer176Action|Adventure|Drama12/21/66205.719661.754989e+075.136900e+07
1086210863Beregis Avtomobilya0.0651411.462429e+073.982690e+07Eldar Ryazanov94Mystery|Comedy1/1/66116.519661.754989e+075.136900e+07
1086310864What's Up, Tiger Lily?0.0643171.462429e+073.982690e+07Woody Allen80Action|Comedy11/2/66225.419661.754989e+075.136900e+07
1086410865Manos: The Hands of Fate0.0359191.900000e+043.982690e+07Harold P. Warren74Horror11/15/66151.519661.276423e+055.136900e+07